Skip to content

[WIP] Export merge tree partition to object storage#939

Closed
arthurpassos wants to merge 50 commits into
antalya-25.6.5from
export_mt_part_to_object_storage
Closed

[WIP] Export merge tree partition to object storage#939
arthurpassos wants to merge 50 commits into
antalya-25.6.5from
export_mt_part_to_object_storage

Conversation

@arthurpassos

@arthurpassos arthurpassos commented Jul 28, 2025

Copy link
Copy Markdown
Collaborator

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Implement exporting partitions from merge tree tables to object storage in a different format (e.g, parquet). The files are converted to the destination format in-memory.

Syntax: ALTER TABLE merge_tree_table EXPORT PARTITION ID 'ABC' TO TABLE 's3_hive_table'.

Related settings: allow_experimental_export_merge_tree_partition.

  1. The destination file names and paths, for now, are decided on the destination engine (I am only testing and thinking about S3 with hive, so <table_root>/pkey1=pvalue1/.../pkeyn=pvaluen/<snowflakeid>.parquet). Most likely in the future we'll not be using snowflakeids for the filenames.
  2. A commit file should be uploaded at the end of the execution to signal the completion of the transaction, the filename is: commit_<partition_id>_<transaction_id>. It shall contain the list of files that were uploaded in that transaction.
  3. A partition can not be exported twice. The limitation comes from the fact upon re-export we don't have a reliable way of telling which parts should be exported (we can't duplicate data). Parts might have been merged with un-exported parts and etc.
  4. The parts selected for an export are not locked at all. We just keep references so they are not deleted from disk, it is totally ok to mutate or merge them meanwhile.
  5. Exports should be able to recover from hard failures/disasters (hard re-start or crash). This is controlled using export manifests that are written on disk.
  6. Exports should be able to recover from soft failures (i.e, failed to export a given part but did not crash)
  7. Upon re-start, exports are scheduled based on when they were created.
  8. For now, exports are being scheduled in the same list of disk moves. I still need to decide if I'll create yet another queue or re-use one of the existing ones.
  9. Export manifests are being written on anyDisk.
  10. There is some half-baked observability on system.exports and system.part_log

Documentation entry for user-facing changes

...

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-25.8 enhancement New feature or request tiered storage Antalya Roadmap: Tiered Storage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ALTER TABLE EXPORT to external table

3 participants